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ABSTRACT 

It is common for supervisors to evaluate their 
supervisees with a rating form. Despite the importance of supervisor 
ratings to the training cf counselors and therapists, very little 
attention has been devoted to the overall reliability 
(generalizability) of these ratings. This study examined the 
generalizability of supervisor ratings of counselors-in-training. 
Participants included 23 counselor trainees enrolled in a masters 
level prepracticum course and 9 doctoral-level counseling 
supervisors. Ratings of counselor and supervisor effectiveness were 
collecter^ through the use of the Counselor Effectiveness Scale. At 
the beginning of the term, practicum trainees were randomly assigned 
to supervisors. Each prepracticum counselor audictaped a counseling 
session with a volunteer client on each of 6 weeks. Within a week 
following each counseling session, counselors met with their 
supervisors for a 60 minute supervision session. Following each 
supervision session, the supervisor ratod the effectiveness of the 
counselor, and the counselor rated the effectiveness of the 
supervisor. Generalizability analyses were performed. Results showed 
generalizability of supervisor's ratings of counselor effectiveness 
were affected more by the number of occasions on which the counselor 
was rated than by the length of the rating instrument. Similar 
findings were observed for counselor ratings of supervisor 
effectiveness. (ABL) 
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Abstract 

The generalizability of counselor and supervisor 
effectiveness ratings was investigated through data collected 
during supervision activities within a counseling practicum 
class. Generalizability of supervisor's ratings of counselor 
effectiveness wore affected more by the number of occasions on 
which the counselor was rated than by the length of the rating 
instrument. Similar findings were observed for counselor ratings 
of supervisor effectiveness. 



Generalizability of Effectiveness Ratings 
for Counselors and Their Supervisors 

It is common, if not required, for supervisors to evaluate 
their supervisees with a rating form. As Stoltenberg and 
Delworth (1987) have recently indicated, it is such quantitative 
evaluation of a counselor's abilities that may comprise the full 
extent of information available to a new supervisor in planning 
the continued training of that counselor. "Although it is a 
common assumption that one can sort a room of therapists into 
good ones and others, the danger exists that the interpretations 
or inferences made by a supervisor oay be misleading" 
(Stoltenberg & Delworth, 1987, p. 113). 

Despite the importance of supervisor ratings to the training 
of counselors and therapists, very little attention has been 
devoted to the overall reliability (generalizability) of these 
ratings. The primary objective of this investigation was to 
study the generalizability of supervisor ratings of counselors- 
in-training. Specifically, the study was designed to answer the 
following basic question: If we were to devise a optimally 
effective rating scale for a supervisee effectiveness rating, 
how many items would be rated on that scale, and how often (i.e., 
on how many different occasions) would we ask the supervisor to 
make those ratings? 

A second type of rating form common in supervision research 
is a trainee assessment of the supervisor. Stoltenberg and 
Delworth (1987) argue that trainee ratings may well be much more 
variable than those of supervisors since they are often a 
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function of "how comfortable— and not how effective— supervision 
was" (p. 118) . They suggest that some considerable time may pass 
after supervision is completed before some trainees can best 
recognized the impact that supervision has had upon their 
learning. Thus, it appeared very relevant to include in this 
investigation a second important research question: What is the 
optimal number of times, occasions, and counselor/supervisees to 
discriminate among supervisors of varying abilities and 
characteristics? 

To address each of these questions, generalizability 
analysis was employed. Generalizability theory liberalizes and 
extends classical test theory. In particular, it allows for 
consideration of multiple sources of error by applying analysis 
of variance procedures to assess the dependability of 
measurements (Brennan, 1983; Cronbach, Gleser, Nanda, & 
Rajaratnam, 1972; Webb, Rowley, & Shavelson, 1988). 
Consequently, generalizability theory is applicable to a broad 
range of measurement, evaluation, and testing studies that arise 
in education and psychology. The counseling supervision 
literature, to this point, included no generalizability analyses 
of ratings of supervisee or supervisor effectiveness. 

Generalizability research begins by conducting a 
generalizability study (G-study) . A design for data collection 
is constructed to include the facets (i.e., variables) of 
interest. For example, a researcher might construct a design to 
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investigate the generalizability of trained raters' evaluations 
of counselor interventions. If the researcher were interested in 
determining the generalizability of ratings across multiple 
raters and multiple constructs (e.g., empathy, genuineness, 
regard, concreteness) , the design would include a two measurment 
facets for raters and constructs, and a single differentiation 
f^cet, counselors. Data generation might involve having multiple 
raters rateeach of anumber of counselor's taped interviews on 
each of the constructs of interest. 

On completion of the G-study, generalizability analysis 
allows the estimation of generalizability coefficients for 
hypothetical designs (D-studies) Uoing ti . variance components 
computed Trom observed data (the G-study) . This use of 
generalizability analysis is analogous to classical measurement 
theory's use of the Spearman-Brown prophesy formula to estimate 
the change in the reliability of a test if the number of items is 
increased or decreased. However, whereas classical methods are 
unidimensional (e.g., items are viewed as a single source of 
error), generalizability analysis permits multidimensional 
assessment of factors affecting one's ability to differentiate 
reliably among the objects of measurement (Brennan, 1983; Webb, 
Rowley, & Shavelson, 1988). 

Methods 

Participants 

The initial participants in this study included 23 counselor 
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trainees (4 male, 19 female) enrolled in a masters level 
prepracticum course and 9 doctoral-level counseling supervisors 
(4 male, 5 female) . Most prepracticum students werr enrolled in 
their first term of work in the master's degree program in 
counseling. About half of the 23 students had previously held a 
position that was counsel ing-related to some extent. The 
doctoral level supervisors were currently enrolled in a course in 
counselor supervision. Trainees were rando:nly assigned to 
supervisors at the start of the practicum course. 
Instruments 

Ratings of counselor and supervisor effectiveness were 
collected through use of the Counselor Effectiveness Scale (CES, 
Ivey & Authier, 1978). This instrument consists of two forms 
with 25 semantic differential scaled items on each form. The 
first set of CES items. Form A, was used by supervisors to rate 
their supervisee's effectiveness, while the second set, Form B, 
was used by the supervisees to evaluate the effectiveness of 
their supervisors, in a study of the concurrent validity of 
counselor effectiveness instruments, Wilson and Yager (1987) 
found the two CES scales to be highly correlated (r = .94, g < 
.001). A more extensive review of the measurement 
characteristics of this instrument has been presented by 
Ponterotto and Furlong (1985). 
Procedures 

At the beginning of the term, practicum trainees were 
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randomly assigned to supervisors. Each prepracticum counselor 
audiotaped a counseling session with a volunteer client on each 
of six weeks. within a week following each counseling session, 
counselors met with their supervisors for a 60 minute supervision 
session. Following each supervision session, the supervisor 
rated the effectiveness of the counselor (CES - Form A) and, in 
turn, the counselor rated the effectiveness of the super\'isor 
(CES - Form B) . These ratings were returned directly to the 
researcher to insure confidentiality. During the course of the 
investigation, some of the rating forms were not obtained after 
every supervisory session. As a result, not all supervisory 
sessions v/ere rated, and the final number of trainees who had 
been rated at least six times was only 21 of the original 23 
students. 

To address the generalizability of ratings of counselor and 
supervisor effectiveness, estimated mean squares, variance 
components, error variances, and generalizability coefficients 
were computed using the General Purpose Analysis of Variance 
System, Version 2.2 (GENOVA, Crick & Brennan, 1984). Additional 
exploration of the reliabilities, correlations among variables, 
and mean differences among supervisors was accomplished through 
the use of appropriate SPSS (Nie, Hull, Jenkins, Steinbrenner, & 
Bent, 1975) subprograms. 

In conducting the generalizability analysis, the item facet 
was a random facet since the rating forms used in this study were 
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constructed by randomly sampling items from the original CES item 
set. The facets, counselor , supervisor , and occasion , were 
treated as random facets in that counselors, supervisors, and 
occasions not observed in the present G study could be exchanged 
with those observed in the G study, even though not sampled 
randomly (Webb, Rowley, & Shavelson, 1988). 

Results 

Supervisor' s Ratings of Counselor Effectiveness 

To assess the generalizability of supervisor's ratings of 
counselor effectiveness, the dat- were cast in a design with one 
object of measurement, counselors, and two facets, o ccasions and 
items , in the design over measures. Although counselors were 
nested within supervisors, GENOVA (Crick & Brennan, 1984) does 
not permit an object of measurement to be nested within another 
factor, thus supervisors was not included as a facet in this 
analysis . 

The mean squares, and estimated mean square variance 
components are presented in Table 1. In this G study, the 
predominant sources of variance are those involving counselor as 
a main effect (C: 30.74%) or as a member of an interaction term 
(CO: 13.96%, CI: 7.40%, and COI: 40.56%). These four sources of 
variance, taken together, account for 92.66% of the total 
variance. Occasions (0: 3.54%), items (I: 3.64%), and their 
interaction (01: 0.16%), when taken as a set, accounted for 7.34% 
of th'* variance. 

ERIC 
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The major question to be addressed through D study analyses 
was whether greater generalizability could be obtained by 
increases in the number of items given in a single administration 
(the classical strategy for improving the measurement of 
counselor effectiveness) or by augmenting the number of occasions 
on which the items are administered. In addition, the question 
of whether greater benefit is obtained by using randomly sampled 
items (items nested in observations ) rather than a fixed sample 
of items (items crossed with observations ) was of interest. 

Effect of Chan ges in the Number of Items and Occasions. To 
study the effect of changes in the number of items and occasions, 
an arbitrary minimum case of four items administered on two 
occasions was selected. Items and occasions were then 
successively doubled until an arbitrary maximum case of 64 items 
administered on 8 o 'casions was reached. Generalizability 
coefficients for designs featuring various combinations of sample 
sizes for items crossed with occasions are presented in the first 
half of Table 2. Increases in either the number of items or in 
the number of occasions produce increases in the generalizability 
coefficient. However, the generalizability increases more 
rapidly with increases in the number of occasions than with 
increases in the number of items. Assuming .90 as a minimally 
acceptable value for the generalizability of supervisor ratings 
of counselors, one would collect ratings on at least 8 occasions 
using an instrument consisting of at least 8 items. 

ERLC 
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Fixed vs. Randoml y Sampl ed items. Most counselor 
effectiveness studies use a single rating form consisting of a 
fixed list of items. To explore the effect of constructing a 
large item pool and randomly sampling a subset of items for each 
administration of the rating form, the data were recast into a 
design featuring items nested in occasions . Generalizability 
coefficients for various combinations of sample sizes for items 
nested in occasions are presented in the second half of Table 2. 
Element by element comparison of the crossed vs. nested designs 
revealed that nesting items in occasions produced a uniform 
decrease in generalizability over that observed for the 
corresponding case using a fixed item set. 
Counselor's Rati ngs of Supervisor Effectiveness 

To assess the generalizability of counselors' ratings of 
supervisor effectiveness, che data were case in a design with one 
object of measurement, supervisors, and two facets, occasions and 
items in the design over measures, in this analysis of the 
discriminations among supervisors, the factor, counselors , was 
considered to be a measurement source nested within supervisors . 
Since not all of the 9 supervisors had complete data sets (a 
complete set would consist of counselor ratings from 3 different 
counselors ~ 25 items over 6 occasions) , one supervisor was 
dropped and ratings from five counselors wore discarded, leaving 
a data set of 8 supervisors with counselor ratings from 2 
counselors consisting of 25 items over 6 occasions. Because no 
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appreciable difference was found with the counselor effectiveness 
data in comparing items crossed with occasions versus items 
nested in occasions, only the analysis of items crossed with 
occasions was conducted with the supervisor effectiveness data. 

Effect of Chancres in the Number of Items. Occasions, and 
Counselor /Supervisees . since the analysis of supervisor ratings 
of their supervisees revealed that a range of from 8 to 32 items 
was a sufficient spread to understand the effect of item length, 
an arbitrary minimum case of 8 items administered on 2 occasions 
by 2 counselor/supervisees was selected. Items, occasions, and 
counselor/supervisees were then successively doubled until an 
arbitr-ry maximum case of 3 2 items administered on 16 occasions 
by 16 counselor/supervisees was reached. 

The mean squares, and estimated mean square variance 
components are presented in Table 3. As before, the predominant 
sources of variance are those involving counselor as a main 
effect (C:S, 10.98%) or in interaction with instrument facets 
(CO:S, 19.10%, ci:S, 13.55%, andCOIrS, 45.99%). These four 
sources of variance, taken together, account for 89.62% of the 
variance. Surprisingly, factors involving supervisor as a main 
effect (S: 1.66%) and in interaction with instrument facets 
(SO, 0.88%, SI, 0.40%, and SOI, 1.85%) only accounted for an 
aggregate of 4.79% of the variance. Occasions (O, 0.00%), items 
(I, 5.32%), and their interaction (01, 0.27%), taken as a set, 
accounted for 5.60% of the variance. 
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Generalizability coefficients for designs featuring various 
combinations of sample sizes for co\.nselors (with various numbers 
of observations nested within counselors) and for itemr > are 
presented in Table 4. Increases in either the number of items, 
the number of occasions, or the number of counselor/supervisees 
produce increases in the generalizability coefficient. The 
generalizability increases least rapidly with increases in the 
number of items and most rapidly with increases in the number of 
counselors, with ratings from 16 counselors per supervisor on at 
least 16 occasions using an instrument consisting of 32 items, 
only marginal generalizability (G = 0.66) is achieved. The 
pattern of change in this matrix of coefficients suggests that 
greater generalizability would be achieved by further increases 
in the number of counselors providing ratings, and secondarily by 
further increases m the number of occasions on which ratings are 
collected. The classical strategy of adding items to the rating 
instrument would clearly not be supported by the data in this 
case. 

Classicctl I nstrument Performance Ind ices 

To permit relating the findings of this study directly to 
traditional work in the field counselor and supervisor 
effectiveness, several analyses were performed based on classical 
true-score test theory. Item homogeneities were computed by 
Cronbach's a, Pearson product-moment correlations were computed 
among ratings made of supervisee and supervisor effectiveness 
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across occasions, and tests were made to determine whether there 
were mean differences between supervisor ratings of their 
supervisees and between ratings received by supervisors from 
their supervisees. 

Item Homogeneities . Cronbach's g reliabilities and 
intercorrelations among variables were computed and cast as a 
multitrait-multimethod matrix as presented in Table 5. At each 
occasion, the supervisor's ratings of the counselor and the 
counselor's ratings of the supervisor yielded remarkably high 
scale homogeneities (median: .95). This finding is consistent 
with previous research on the Ivey scales (Wilson & Yager, 1987) . 

Correlations a mong Ratings. Supervisor ratings of 
counselors across occasions were all highly correlated (all were 
significant at e < -01, half were significant at e < .001). 
Their values ranged from .56 to .80 with a median of .65. These 
correlations tended to follow a pattern: ratings made during 
adjacent time periods tended to be more highly correlated with 
each other than were ratings made at periods more widely 
separated in time. Thus, although the supervisor's view of the 
counselor war> relatively consistent from one time period to the 
next (median correlation between adjacent time periods: .12), 
there was a gradual change over time such that the supervisor's 
initial rating accounted for 36% of the variance in the 
supervisor's final rating of the counselor. 

Counselor ratings of supervisors were less well correlated 
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(only half were significant at 2 < .05). Their values ranged 
from .05 to .89 with a median of .37. These correlations were 
uniformly patterned such that the magnitude of the correlation 
decreased with increases in temporal distance. The counselor's 
view of the supervisor was also relatively consistent from one 
time period to the next (median correlation between adjacent time 
periods: .74), but there was much more change in view over time. 
The counselor's initial rating only accounted for 2.5% of the 
variance in the counselor's final rating of the supervisor. 

There was little relationship between the supervisor's 
ratings of the counselor and the counselor's ratings of the 
supervisor. These multi-rater/multi-occasion correlations 
ranged from -.31 to .35 with a median >f .08. None were 
significant at e < .05. No clear pattern emerged among the 
correlations. These correlations suggest that there was no 
systematic mutuality among supervisor's and counselor's ratings 
of one another. 

Differences Among SupeTrvisor ' s Ratings of their Counselors , 
To determine whether supervisors differed, the average rating 
given to their supervisees at each occasion was calculated. Six 
analyses of variance were computed, each featuring one factor in 
the design over subjects, supervisors . The results of this 
analysis are presented in Table 5. Initially, supervisors 
differed in the mean effectiveness rating given to their set of 
supervisees, however, over the six occasions, this difference 
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diminished such that by the sixth occasion, no significant 
difference was found. Inspecting the grand mean across 
occasions, it is interesting to note that on each successive 
occasion, the overall evaluation of the entire set of supervisees 
became more positive (reflected by a steadily decreasing score) . 

Differences Among Counselor's Ratings of their Supervisors . 
A different picture emerged when mean ratings of supervisor 
effectiveness were compared across supervisors. To determine 
whether supervisors differed the average rating received from 
their supervisees at each occasion, six analyses of variance were 
computed, each featuring one factor in the design over subjects, 
supervisors. The results of this analysis are also presented in 
Table 5. No significant difference between supervisors was 
observed for any occasion when mean supervisee ratings of the 
supervisors' effectiveness was compared. However, inspection of 
the grand mean reveals that in general, there was slight but 
steady improvement over time (as reflected by decreasing scores) 
in the supervisee's percei,.txon of the supervisor. 

Discussion 

In response tc the initial questions raised in this study, 
it appears relatively clear that rating scales similar to those 
that supervisors coiomonly employ to rate the effectiveness of 
counselor trainees can be developed to allow for good 
generalizability. Over 90% of the variance in supervisors' 
ratings of counselors is attributable to differences among 
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counselors (c, 30.77%) and interactions of counselors with 
occasions and/or items (CO, 13.96%; CI, 7.40%; COX, 40.56%). 
Increasing the number of occasions for observation and rating of 
a trainee is, apparently, more important that increasing the 
number of items in the rating scale. in fact, an 8 item scale 
administered over eight occasions is only slightly less reliable 
than a 64 item scale administered over the same number of 
occasions . 

The proportion of variance accounted for by differences 
among counselors nested within supervisors (C:S, 10.:^8%) was 
considerably greater than that accounted for by differences among 
the supervisors (S, 1.66%). Furthermore, the interactions 
between occasions and/or items with counselors (CO:S, 19.10%; 
CI:S, 13.55%; COI:S, 45.99%) accounted for considerably larger 
proportions of the variance than does the corresponding 
interactions with supervisors (SO, 0.88%; SI, 0,40%; SOI, 1.85%). 
These findings are consistent with Stoltenberg and Delworth's 
(1987) speculations that trainee ratings may well be much more 
variable than those of super\^ '.sors since they are often a 
function of comfort with supervision — not with it's 
effectiveness. it certainly seems that the counselor-in- 
training' s ratings of the supervisor are much less gen^ralizable. 

Additionally, there is little correlation among the 
counselors' ratings of supervisor effectiveness across time. The 
counselors generally view their supervisors as vpry competent 
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(overall means are consistently positive) , however their reasons 
for rating the supervisor appear to differ from time to time 
(very low correlations between ratings across time) . 

In evaluating the effectiveness of supervisors, especially 
heterogeneous samples of supervisors (e.g., supervisor trainees), 
contrary to what one might expect from classical theory, 
lengthening the test does not appear to be the best way to 
improve the dependability of discriminations. Greater 
dependability derives from using a greater number of counselors 
per supervisor and collecting ratings on a greater number of 
occasions. Although this approach is most desirable from a 
statistical point of view, it is, unfortunately, more expensive 
in terms of time and effort on the part of the supervisors and 
counselors. However, failure to take these sources of error into 
account would run the risk of making decisions about relative 
effectiveness of supervisors based on error-prone data. 
Limitations 

This study of supervisor evaluations of prepracticum 
counselors and their evaluations of their supervisor's was 
conducted within a single counselor training program under actual 
counselor training conditions. The findings may not apply (a) to 
training programs drav/ from a different population of trainees, 
(b) to counselor-trainees at higher levels of training, (c) to 
faculty rather than student supervisors, or (d) to counselors and 
supervisors iia clinical rather than training settings. 
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Findings relating to the generalizability of counselor 
evaluations of supervisor effectiveness may be spuriously 
conservative due to a restriction of range in supervisor ability. 
Unlike field conditions, these data were collected in a training 
institution using supervisors who were undergoing training in 
supervision and were, themselves, being supervised. Each 
supervisor had a clear idea of what was to be the focus of the 
student's learning in the prepracticum class, and, it is likely, 
that each supervisor approached the student supervision sessions 
with similar goals and objectives. Under such conditions, it is 
reasonable to assume that their performances as supervisors would 
be more similar than would the performances of randomly sampled 
field supervisors. Under ideal circumstances, study of the 
generalizability of supervisor effectiveness ratings would be 
conducted with a heterogeneous, rather than homogeneous, sample 
of supervisors. Since supervisor homogeneity increases the 
difficulty of reliable differentiation among supervisors, it is 
likely that in field s^ituations, with a more heterogeneous sample 
of supervisors, one would not need as many counselors and 
occasions as were indicated in these data. 

This same homogeneity of supervisors also provides a 
possible explanation for the relatively small intercorrelations 
between the repeated ratings of the supervisor by the counselors. 
The larger the variability of the supervisors (as targets of the 
rating process), the larger the expected reliabilities. A 
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restricted range may, thereby, have depressed the possible 

intercorrelations across occasions. 

Cautions 

Like prophesies made through use of the Spearman Brown 
formula, the prophesies made in generalizability analysis relate 
to conditions not actually sampled (D study results) . Therefore, 
these data require replication to determine whether the benefits 
anticipated will be realized in subsequent research studies. 
This caution especially true when the original G study was based 
on small samples for some or all factors under investigation, as 
was the case in the present study. Ideal G studies involve large 
sarr^ples over all facets. In this study conducted within a single 
counselor training program under realistic training conditions 
only 9 supervisors and 23 counselors-in-training were available 
for study within a single year. 
Suggestions for Future Research 

There are a number of additional research studies that would 
follow directly from this investigation. Among the possible next 
steps would be the following: 

1. Add a source facet (supervisor rating, self -rating, 
client rating, observer rating) to the design to permit 
study of the facets: counselors x source s x occasions x 
items . 

2. Increase the G study sample sizes (e.g., at least 3 or 4 
counselors per supervisor, at least 6 observations, and 10 
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or more supervisors) . 

3. Use a smaller instrument (e.g., a random sample of 10 
items from each of Ivey's two forms pooled into a single 
form, or, a simple random sample of 15 or 20 items) 

4. Collaborate with other counselor training institutions 
to aggregate greater numbers of supervisors and counselors. 

5. Collaborate with community agencies to study a 
heterogeneous group of supervisors employing a design 
featuring facets for supervisor x counselor ; supervisor x 
occasions x items . 

This study has presented perhaps the first application of 
generalizability analysis to the ratings of counselors and 
supervisors. As the first investigation in this area, it leaves 
a number of very interesting possible directions of research for 
the future. 
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Table 2 

Generalizabilitv Coefficients for Supervisor's Ratings of 
Counselor Effectiveness (U q = 21 Counselors) . 

D Study: Counselors x Occasions x Items Design 

Items 

8 16 32 64 



2 


.69 


.75 


. 78 


.80 


.81 


4 


.80 


.84 


.87 


.88 


.89 


8 


.86 


.90 


.92 


.94 


.94 



D Study: Counselors x Occasions x Items: Occasions Design 



Items 

Occas- 
ions 4 8 16 32 64 



2 


. 67 


.72 


.75 


.76 


.77 


4 


. 80 


.84 


.86 


.87 


.87 


8 


.89 


.91 


.<52 


.93 


.93 



Occas- 
ions 



General izability 
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Table 3 

Expected Mean S quares for Counselor's Ratings of Supervisors' 
Effectiveness (G Study; Supervisors x Counselor - , tSupervisors v 
Occasions x item?: Design^ (n g = 8 Supervisors, n ,^ = 2 Counselors 
per Sup ervisor, n g = 6 Occasions, n j = 25 Items^ . 



Sources 


df 


SS 


MS 


EMS 


%EMS 


s 


7 


257.24 


36.7490 


0.0218 


1.66% 


C:S 


8 


236.22 


29.5271 


0. 1440 


10.98% 


0 


5 


36.0^ 


7.2044 


(O.C) 




T 
X 


24 


204 . 86 


8. 5360 


0. 0698 


5.32% 


so 


35 


262 . 27 


7.4936 


\J m U X X O 


n QQ% 

U • OO '0 


SI 


168 


299.12 


1.7805 


0. 0052 


0.40% 


CO:S 


40 


274.64 


6.8661 


0.2505 


19. 10% 


CI:S 


192 


320.53 


1.6694 


0. 1777 


13.55% 


01 


120 


85.03 


0.7086 


0. 0036 


0.27% 


SOI 


840 


547.42 


0.6517 


0. 0242 


1.85% 


coirs 


960 


579. 11 


0.6032 


0. 6032 


45.99% 


Total 


2399 


3102.48 
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Table 4 

Generalizabilit y Coefficients for Counselor's Ratings of 
Supervisor Effectiveness (n^ = 8 Supervisors, n ^ = 2 Counselors 
per Supervisor, ng = 6 Occasions, n j = 25 Items). 

D Study: Supervisors x Counselors: Supervisors x 
Occasions X Items Design 



Items 

Coun- Occas- 



t elors 


ions 


8 


16 


32 




4 


0. 14 


0. 16 


0. 16 


2 


8 


0. 17 


0. 18 


0.19 


2 


16 


0. 19 


0.20 


0.21 


4 


4 


0.25 


0.26 


0.27 


4 


8 


0.29 


0.30 


0.31 


4 


16 


0.31 


0.33 


0.34 


CO CO 


4 


0. 38 


0.40 


0.42 


8 


0.43 


0.46 


0.47 


8 


16 


0.47 


0.49 


0.50 


16 


4 


0. 52 


0.55 


0.56 


16 


8 


0. 59 


0. 61 


0.62 


16 


16 


0. 62 


0.64 


0.66 
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Table 5 

Scale Homogeneities. Correlations betwee n Ratings, and Tests for Differences among Supervisors for (a) Client's 

Ratings of Counselor Effectivene ss, (b) Supervisor's Ratings of Counselor Effectiveness, and (c) Counselor's 
Ratings of Supervisor Effectiveness (n = 21 Counselors). 





S>C^ 


s>c 


S>C 


S>C 


S>C 


S>C 


OS^ 


C>S 


OS 


C>S 


C>S 


C>S 




1 


2 


3 


4 


5 


6 


1 


2 


3 


4 


5 


6 


S>C 1 


(.96) 
























S>C 2 


.72 


(-96) 






















S>C 3 


.70 


.65 


(.95) 




















S>C A 


.62 


.57 


.58 


(.97) 


















S>C 5 


.74 


.64 


.72 


.80 


(.96) 
















S>C 6 


.57 


.68 


.56 


.64 


.78 


(.95) 














OS 1 


.32 


.17 


-.08 


-.08 


.07 


.08 


(.93) 












OS 2 


.25 


.18 


.09 


-.31 


-.19 


-.18 


.59 


(.91) 










C>S 3 


.21 


.15 


.04 


.05 


.35 


.30 


.28 


.74 


(.95) 








C>S 4 


.15 


.08 


-.10 


-.03 


.17 


.22 


.29 


.54 


.53 


(.94) 






C>S 5 


.24 


• .08 


-.18 


.06 


.21 


.20 


.20 


.37 


.36 


.89 


(.96) 




C>S 6 


.12 


• .17 


-.30 


-.07 


-.06 


-.18 


.05 


.17 


.12 


.64 


.76 


(.95) 


Grand M 


2.99 


2.71 


2.61 


2.55 


2.26 


2.23 


1.90 


1.90 


1.81 


1.78 


1.67 


1.60 


Grand ^ 


0.82 


0.86 


0.81 


0.94 


0.78 


0.86 


0.60 


0.64 


0.72 


0.73 


0.65 


0.64 


F(8.1425 


6.39 


9.11 


3.90 


2.65 


3.46 


1.74 


0.83 


1.06 


1.18 


1.25 


1.49 


0.80 


6 


.001 


.001 


.01 


.05 


.02 


.19 


.59 


.44 


.38 


.34 


.25 


.61 



2S>C: Supervisor (S) ratings of their Counselor Supervisees (C) over six counsel ing/si4>ervi si on sessions. 
jOS: Counselor (C) ratings of their Supervisor (S) over six counseling/supervision sessions. 
Tests for differences among supervisors mean ratings of their counselor/supervisees and of their mean rating 
received from the counselor supervisees. 

Critical Values: r = .43, g < .05; r = .56, g < .01; r = .66, fi < .001. 

Note: Entries on principal diagonal (in parentheses) are homogeneity estimates. 



2o 



